The COVID-19 pandemic has stirred society up by quite large margin. Many people are (in)directly affected by the health crisis or the resulting governmental measures. This led to adjustments, e.g. social distancing and isolation, causing society to change communication, work and more aspects of daily life. This dashboard will explore the following:
Did Spotify users in the Netherlands change their music listening behavior during the COVID-19 pandemic?
A corpus has been created in order to perform various computational musicological analyses using the spotifyr and compmus packages.
The general listening behavior of Spotify users in the Netherlands before and during the pandemic will be explored, as measured by the Spotify API. In addition, specific events related to the pandemic (e.g. lockdown and curfew) will be considered as well to find to what extent possible changes in listening behavior can be attributed to these events.
In order to analyze general listening behavior, the most important variables for the portfolio are:
In order to keep track on the average listening behavior of Dutch Spotify users, the weekly ‘Top 50’ playlists from the Netherlands will be analyzed over time. The years 2019 (52 weeks) and 2020 (53 weeks) and will be measured in its entirety, and 2021 is measured until week 7.
2019 contains 52 playlists consisting of 50 tracks per playlist
2020 contains 53 playlists consisting of 50 tracks per playlist
2021 contains 7 playlists consisting of 50 tracks per playlist Totaling 5600 observations/tracks. As a track can be in the charts for multiple weeks, duplicates occur. The number of unique tracks within the corpus is 826.
Since Spotify autoupdates their playlists, the historical ‘Top 50’ lists in the form of CSV files will be retrieved from Spotify Charts.
The changes of (or lack thereof) listening behavior will be measured by the the different Spotify Audio Features:
Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
The key the track is in. Integers map to pitches using standard Pitch Class notation . E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on.
The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
The duration of the track in milliseconds.
Also the following variables obtained through the Spotify API will be included:
Number of streams
Position
Track Name
Artist
Streams
The variable time will be used to identify the different weeks as well as the periods before and during the pandemic that may explain the changes in music listening behavior from the top and viral playlists.
In addition interesting annual periods will be isolated to see if similar patterns reoccur during the pandemic. For example, the December Holiday season before and during the pandemic will be analyzed to identify whether Spotify users altered their Christmas related listening behavior.
The corpus measures from week 1, 2019 till week 7, 2021 and will split the data into two periods. A period Before the pandemic and During the pandemic. This will make it clearer to attribute analyses to these periods, rather than annually or weekly.
Alongside the musical analyses, statistics concerning COVID-19 will be taken into account as well. The used data is provided by the The Dutch National Institute for Public Health and the Environment (RIVM). The data has been pre-processed to include both weekly and cumulative data. The variables that are included in this dashboard are the following variables:
Number Hospital Admissions
Number of Deaths
Reported cases of COVID-19
On February 27th, 2020 (week 9), the first case of COVID-19 was confirmed in the Netherlands. Before this occurrence (when everything was still normal) from week 4 in 2020, the streams of the top songs decreased. When the first COVID-19 cases, admissions and deaths were confirmed, the number continued to decline. Until the cases started to increase more rapidly. What caused this increase?
As the situation became more severe, with record COVID-related hospital admissions in week 13, the Dutch government implemented the first lockdown: the ‘intelligent’ lockdown (first dotted line). This led to relative high public solidarity towards those affected by the virus, especially essential workers en the elderly. This may explain the sudden spike of streams rising from 1,598,458 to a record 3,482,822 streams in weeks 13 and 14 in 2020. The song 17 Miljoen Mensen - Live @538 in Ahoy by Davina Michelle topped the charts for five consecutive weeks. This song was dedicated to the people affected by the virus. So this spike is most likely related to the virus and its effects. Although during the 2nd and 3rd lockdown an increase in streams is shown as well, the measures may not be the cause as it is in line with the annual trend.
Dutch Spotify users in general are fairly consistent, when it comes to streaming popular songs. Just looking at 2019, 2020 (and the first 7 few weeks of 2021), you can find a reoccurring pattern.Streams increase during the first weeks followed by a sharp decline. During summertime, streams remain relatively stable. And Streams decrease before the holiday season, with a short spike on Christmas. Showing therefore a ‘procyclical’ behavior, simply meaning that popular songs on average are correlated to a general trend (term repurposed from Economics).
Not all Spotify users/songs follow the same trend. In some cases the status quo is defied a large margin. A couple of cases will be introduced that will be further analyzed.
The most streamed song before the pandemic (also within the corpus) is ‘Arcade’ by Duncan Laurence, with 5,473,513 streams in week 21, 2019. This is due to the song winning the 2019 Eurovision Song Contest for the Netherlands.
As mentioned earlier, the track ‘17 Miljoen Mensen’ is also an outlier, but related to the pandemic. Another track that might be related is ‘All I Want for Christmas Is You’ by Mariah Carey. As it was quite strange celebrating Christmas during a pandemic, people were streaming Christmas songs more and earlier in 2020 than in 2019.
So, in general the popular songs in the Netherlands follow a periodic/cyclical trend.
The outliers are caused by a large societal impact, as explained by the togetherness during the first wave of the pandemic, the Netherlands winning Eurovision and people listening to Christmas songs more and earlier.
At the same time, Dutch people ‘recover’ fairly quickly to the regular level. Second and third wave of COVID-19 and the related measures imposed by the Dutch government did not see similar impact as the first. Rather these followed similar pattern as in 2019.
In this frame you make week-for-week comparisons for different variables based on the period before or during the pandemic.
The interesting variables to compare the different periods are valence and energy as these reflect the valance/arousal model that shows the emotions Happy, Angry, Sad, Relaxed.
The corpus is spread fairly evenly, and both before and during the pandemic most of the Top 50 tracks are in the Happy quadrant. This is not very surprising, as the Top 50 usually mostly consist of pop songs.
In the weekly plots it is observed that the average valence is much higher during the pandemic. And that the average weekly streams follow a similar trend, with some anomalies. These will be explored further down the line.
In the corpus there are a total of 35 distinct tracks topping that charts in a span of 112 weeks. On average a Top 1 track remains for 3.5405405 weeks on the number 1 spot, while streamed 7381669 times, and 2062878 times per week on the Top 1 spot.
This gives an interesting insight to the weekly most popular songs in this corpus. But which track is the most popular? Since the corpus is time constrained, solely considering the number of streams might not show the full picture. Therefore, three variables are considered:
1. Total number of streams while on Top 2. Number of weeks on Top 3. Average streams while on Top per week on Top
The most popular song according to variable 1 is “Dance Monkey” by Tones And I, with a total of 31,534,809 streams while at number 1. The other variables (v2 = 14 weeks | v3 = 2,252,486 per week) show that the track performs above average and can indeed be considered popular, both before and during the pandemic.
According to variable 2, Dance Monkey again is the most popular. But at second place, we find “Mood (feat. iann dior)” by 24kGoldn is the most popular, topping the charts for a total of 9 weeks. The other variables (v1 = 15,498,940 | v3 = 1,722,104 per week) show that it is significantly popular under v1, but underperforms under v3, this track is popular during the pandemic.
Variable 3 shows that the Euroviosion winner “Arcade” by Duncan Laurence is streamed the most per week while on top with 4,274,571 streams. The track performs above average on v1, but underperforms under v3 (v1 = 8,549,142 | v2 = 2 weeks). Winning the Eurovision (before the pandemic) cause a huge spike, but was rather temporarily.
During the pandemic, according to v3, “Tigers” by Bilal Wahib is streamed the most per week while at the top with 2,652,339 streams. The track underperforms on the other variables (v1 = 5,304,679 | v2 = 2 weeks). Thus, this song can be considered a viral hit or more harshly a “one hit wonder” for two weeks.
There are limitations to this approach, as only the number one spots are considered. For a clearer picture, a wider range is recommended. Even considering these variables for these top songs, most perform very similarly. Thus, it remains difficult to denote which song is the most popular.
The track “17 Miljoen Mensen” (2020) is a cover of “15 Miljoen Mensen” (1996). An analysis of the chromafeatures of the two tracks aims find similarities between them. Notice the d 17 Miljoen mensen’s title adjustment for the population increase of 2 million people, and its shortness with a duration of just 1 minute and 47 seconds. But what are other differences or similarities?
The first plot shows the Dynamic Time Warping plot of the two tracks, using Euclidean norm and angular distance. A diagonal pattern would denote similarity between the two tracks. This is not observed, which implies significant differences between the two tracks. This is supported as the the table shows that the pitch classes differ. According to the Spotify API, “17 Miljoen Mensen” is in the key of G major, wheras “15 Miljoen Mensen” is in the key of C major. This is not explicitly shown, but they are represented in their respective chromagrams.
In addition, the ‘sound and feel’ of the tracks differ: 15 miljoen mensen has a higher danceability, energy, and loudness, whereas “17 miljoen mensen” has a much higher acousticness and liveness (due to the recording being a live performance).
A remarkable commonality probably explains the differences: Both tracks were unintended single releases, “15 miljoen mensen” was initially written for a commercial, and “17 Miljoen mensen” as a tribute for a (due to COVID-19) canceled music concert. The different motivations behind the tracks reflects the different ‘sound and feel’ as shown by Spotify API.
For a commercial you would want a more catchy/upbeat track, contrary to a song related to a disaster or crisis. This explains the difference in loundness, “15 Miljoen Mensen” has a loudness of -10.041dB, wheras “15 Miljoen Mensen” has a loudness of -7.063dB.
| 17 Miljoen Mensen (2020) | 15 Miljoen Mensen (1996) | |
|---|---|---|
| danceability | 0.493 | 0.547 |
| energy | 0.321 | 0.631 |
| key | 7 | 0 |
| loudness | -10.041 | -7.063 |
| mode | 1 | 1 |
| speechiness | 0.0402 | 0.0266 |
| acousticness | 0.715 | 0.0943 |
| instrumentalness | 0 | 0 |
| liveness | 0.0863 | 0.0548 |
| valence | 0.508 | 0.481 |
| tempo | 86.77 | 79.02 |
| duration_sec | 107.2 | 236.107 |
| time_signature | 4 | 4 |
Christmas songs started to dominate the charts in 2020 from around week 49 until week 53, whereas in 2019 Christmas this phenomenon occurred a bit later. In 2020 it is noticeable that the bottom right corner contain tracks with relatively high BPM, high valence, lower energy and lower danceability.During these weeks Christmas tracks dominated the charts. In 2019 this phenomenon is very noticeable in week 52, but shows that Christmas slowly started in week 50. Also in 2020, the charts remained similar during the holiday period from week 50 to 53, whereas in 2019 week 52 saw a spike of the Christmas related audio features. This pattern implies more Christmas tracks entered the Top 50.
Interestingly, Mariah Carey’s ‘All I Want for Christmas’ topped the charts for four consecutive weeks in 2020, as opposed to 1 week in 2019.
A Possible explanation is that due to the imposed lockdown and other restrictions, people may have felt a need or desire for the “Christmas Spirit/Vibes” a week earlier than in 2019.
Another interesting discovery is that similar to 2019, the top streams in 2020 decreased in similar fashion. A possible explanation is that people disregarded the lockdown regulations and spent the holiday season with friends and/or family or were preoccupied with other activities to keep in touch with them.
“Dance Monkey” by Tones And I is one of the most popular tracks within the corpus. A structure analysis will show possible patterns of sequences within the track and their relation.
The first cepstrogram plot shows the magnitude of each timbre feature per segment of the track. The feature c01 is loudness, c02 is low frequency, c03 is mid frequencies. c04 and up are not defined as straight forward, but they may be implied by keeping track of changes within a track during specific segments. The cepstrogram shows that "Dance Monkey’s timbre features are relatively more defined by c01 to c05.
The second and third plots are Self Similarity Matrices (SSM); The first being pitch, and the second timbre. These plots show the structure of a track by denoting patterns of similarities that reoccur. Diagonal lines and a checkerboard pattern show similarity and repetition.
The timbre SSM is plotted using Euclidean norm, Euclidean distance and summarized by the mean. The plot shows a faint checkerboard pattern which implies some form of repetition in the track. At the 150 second mark there is a significant timbre difference. This is when the breakdown occurs with the earlier mentioned “Millenial Whoop”.
The pitch SSM is plotted using Euclidean norm, cosine distance and summarized by root mean square. This plot shows a slightly more noticeable checkerboard pattern. At the 150 second mark, the plot shows a significant change.
The track Mood by 24kGoldn ft. iann diorr is also one of the identified popular tracks in the corpus. A keygram and chordogram are plotted in order to show the tonal progression of the track by estimating the chords and key for each segment.
The keygram shows that the key E♭ major, G minor, F major, C major, G major, and C♯minor are prevalent keys during the track. The Chordogram show that the chords C minor, E♭ 7, and E♭ major are the most prevalent chords of the track.
Spotify API
According to the Spotify API, this track is written in the 7th key, with mode 0: meaning G minor.
Chordify
The Chordify algorithm identified the chords within the following (4/4) loop:
The identified key appears to be on the natural scale:
G - A - B♭ - C - D - E♭ - F
The differences of the found/estimated chords are due to the different Audio Chord Estimation algorithms the Spotify API and Chordify uses. While the Chordify is not considered perfect, it is considered ‘good enough’ to be useful. There are differences, but as seen above, there is some overlap between the API’s.
While histogram doesn’t show a clear/unanimous preference, the keys C♯, F♯, G♯ consistently do have a relative high count within the corpus.
In 2019, There is a clear significant higher count of C♯, F, G, B keys.
In 2020, The keys C♯, F♯, G♯, B have a significantly higher frequency in the corpus.
Note that 2021 only contains the first 7 weeks, whereas 2019 and 2020 contain 52 and 53 weeks respectively. Therefore, its not very representative to make the most informed comparisons.
The density plot shows that overall the the most frequent tempi within the corpus is around 90-100 BPM and 115-128 BPM. The year 2019 showed a strong preference for tracks around 98 BPM and to a lesser extent 123 BPM.
The year 2020 showed a strong preference for tracks around both 95 BPM and 121 BPM.
The first 7 weeks of 2021 showed a preference for tracks around 99 BPM and 122 BPM.
Average Tempo:
Press one of these buttons to display the average BPM
Another popular track in the corpus is “Tigers” by Bilal Wahib. Tempograms are plotted in order to show the estimated BPM of the track along its duration.
The tempo feature of the Spotify API estimates a BPM of 111.943 (rounded 112 BPM).
The first Tempogram doens’t explicitly reflect the estimation of the Spotify API, tempi of around 210-220 and 430-450 are shown in the plot. The plot might record the represent half-time and quarter time BPM’s of 224 and 448 (based on the estimation of 112 BPM).
The second Tempogram (cyclic), is adjusted to represent the more ‘common’ tempi at which humans tap. This plot does reflect the Spotify API estimation of 112 BPM more clearly.
At the 75 second mark, there is a slight drop and increase in tempo. From this point, noticeable is the tape stop sound effect, which is immediately followed by the bridge. The BPM however, remains the same (try to tap along).
The data has been heavily shrunk in order to run the machine learning algorithms without crashing. A subset of the data containing the top 3 tracks per week are selected. This totals 336 observations that will be split across the pandemic period. This makes is possible for the algorithm to predict whether a track belongs either in the period prior COVID-19 or during. A ten fold cross validation is used. The algorithm is very accurate on both knn and tree methods having a precision and recall above 90%
The Confusion matrix show the accuracy of the algorithm predicting classifiers. In this portfolio we’d like to find out wheter we can classify songs that belong to in the period prior or during the pandemic. Looking at the Truth and Prediction, the algorithm performs very well on the subset data.
The random forest ranks the importance of the different features that can be attributed to the classification of the songs. In this case we find that the variables Streams, G, c06, G#|Ab, B are the most important in this corpus.
knn precision - recall
| class | precision | recall |
|---|---|---|
| Before COVID-19 | 0.9607843 | 0.9245283 |
| During COVID-19 | 0.9344262 | 0.9661017 |
tree precision - recall
| class | precision | recall |
|---|---|---|
| Before COVID-19 | 0.9320388 | 0.9056604 |
| During COVID-19 | 0.9173554 | 0.9406780 |
We’ve seen that the pandemic did have a significant effect on society. This was also reflected in the spikes of the number of streams for specific tracks, explicitly shown in the earlier plots by the following. - Togetherness and solidarity at the beginning of the pandemic - Earlier and longer Christmas
We’ve seen that Dutch spotify users have a relative quick reaction time for a short period, this holds for the periods before and during the pandemic. This was clearly seen in the Eurovision and 17 Miljoen Mensen examples. This has hasn’t changed, this is also reflected that the subsequent coronawaves did not meet similar solidarity as the first.
However on average, the number streams remained fairly stable. Although similar to prior to the pandemic, we did find that people listened a bit more to ‘happier’ music. Interestingly this phenomenon combined with the fact the people tend to be “coronamoe/corona fatigue” explains that people in the Netherlands are happier despite the pandemic.
All in all it can be concluded that the pandemic did have an impact, but at the same time the Dutch are quick to move on.
I hope that the pandemic will be behind us in the near future, which allows us to analyze the post pandemic music listening behavior!